In this script I’ll be looking at how to analyse the outputs from the MCMCglmm models to get the something about the elaboration/exploration stories.

I’m going to look at it using two approaches: * the blob-wise one where I’ll look at the differences between whole blobs (ellipses) in terms of elaboration/exploration (Robinson & Beckerman style); * and the tip-wise one where I’ll look at the elaboration/exploration score of the tips on the tree (cf. their blobs) relative to the whole phylogeny and to their respective blobs (clades).

Data

I’m going to focus on the data from Gavin and especially the models 5 and 6 that are:

## Loading the correct models
load("../Data/Processed/model_list.rda")
model_phylo1_clade3 <- model_list[[4]]
model_phylo4 <- model_list[[7]]

And here’s what the trait space looks like:

## Loading the data
load("../Data/Processed/morphdat.rda")
## Plotting it
colour_vector <- c("orange", "blue", "darkgreen")
plot(morphdat[, c(1,2)], pch = 19,
     col = colour_vector[morphdat$clade],
     xlab = "PC1 (90.5%)",
     ylab = "PC1 (6.86%)")
legend("topleft", legend = levels(morphdat$clade), col = colour_vector, pch = 19)

Note that the PC% are from the whole PC in Gavin’s example.

Blob-wise approach

Here we’re basically comparing multidimensional ellipses (here 3D ones but I think we should push it to more dimensions!).

Phylo + clade model

First we can visualise some of these ellipses (100, randomly drawn from the posterior):

source("../Functions/plot.ellipses.R")
source("../Functions/get.covar.R")

## Selecting 100 random covariance matrices from the MCMCglmm
covar_matrices <- get.covar(model_phylo1_clade3,
                            n = 100,
                            simplify = FALSE)

## Plotting the results
plot(morphdat[, c(1,2)], pch = 19, cex = 0.5,
     col = colour_vector[morphdat$clade],
     xlab = "PC1 (90.5%)",
     ylab = "PC1 (6.86%)")
plot.ellipses(covar_matrices, centre = "none",
              col = c("grey",colour_vector),
              add = TRUE)

Because this is really messy, we can centre these ellipses on the groups ellipses average centres:

plot(morphdat[, c(1,2)], pch = 19, cex = 0.5,
     col = colour_vector[morphdat$clade],
     xlab = "PC1 (90.5%)",
     ylab = "PC1 (6.86%)")
plot.ellipses(covar_matrices, centre = "level",
              col = c("grey",colour_vector),
              add = TRUE)

We can then measure different stuff from these ellipses (e.g. going full statistics from Robinson & Beckerman).

However, here I’m just going to focus on the elaboration/exploration of these ellipses. For the three clades, I’m going to measure:

  • the angle of the clade’s ellipse to the general one (which will be their degree of exploration);
  • their scaled projection (where the ellipses all start from the point 0,0): showing how much the clade elaborates on the major phylo axis;
  • and their scaled rejection (how much they explore).

For these three metrics I expect:

  • the gulls and plovers to elaborate quiet a lot (angle ~90) compared to the sandpipers (angle ~90)
  • the gulls to explore the more followed by the plovers and then the sandpipers
  • the sandpipers to ellaborate the more followed by the gulls and then the plovers

Note that the three metrics are measured independently of the ellipse position in space and are measured in 3D.

source("../Functions/get.axes.R")
## Get all the phylo main axes
level_centred_major_axes <- get.axes(covar_matrices, centre = "level")
uncentered_major_axes <- get.axes(covar_matrices, centre = "none")

And this is what it looks like for the uncentred ones:

plot(morphdat[, c(1,2)], pch = 19, cex = 0.5,
     col = colour_vector[morphdat$clade],
     xlab = "PC1 (90.5%)",
     ylab = "PC1 (6.86%)")
plot.ellipses(covar_matrices, centre = "none",
              col = c("grey",colour_vector),
              add = TRUE)
plot.all.axes(uncentered_major_axes,
         col = c("grey",colour_vector),
         add = TRUE)

And for the centred ones:

plot(morphdat[, c(1,2)], pch = 19, cex = 0.5,
     col = colour_vector[morphdat$clade],
     xlab = "PC1 (90.5%)",
     ylab = "PC1 (6.86%)")
plot.ellipses(covar_matrices, centre = "level",
              col = c("grey",colour_vector),
              add = TRUE)
plot.all.axes(level_centred_major_axes,
         col = c("grey",colour_vector),
         add = TRUE)

Note that the major axes look slightly off for some of the ellipses. I think this is due to the difference in aspect ratio in both plots that exagerates vertical positions and deforms the ellipses:

plot(morphdat[, c(1,2)], pch = 19, cex = 0.5,
     col = colour_vector[morphdat$clade],
     xlab = "PC1 (90.5%)",
     ylab = "PC1 (6.86%)",
     main = "Major axes of the ellipses\n(aspect ratio = 1)",xlim = c(-1.5, 1.5), ylim = c(-1.5, 1.5))
plot.ellipses(covar_matrices, centre = "level",
              col = c("grey",colour_vector),
              add = TRUE)
plot.all.axes(level_centred_major_axes,
         col = c("grey",colour_vector),
         add = TRUE)

This is an example with just the major axes (projected in 2D):

plot.all.axes(uncentered_major_axes,
         col = c("grey",colour_vector),
         add = FALSE, main = "Major axes (un-centred)")

We can then compare these vectors per blobs. We in order to facilitate the vectors comparisons, we will translate the groups vectors (clade ones) on the origin of the phylo vector.

source("../Functions/vector.diff.R")
## Get all the angles and stuff
gull_results <- mapply(vector.diff, uncentered_major_axes$cladegulls, uncentered_major_axes$phylo)
plovers_results <- mapply(vector.diff, uncentered_major_axes$cladeplovers, uncentered_major_axes$phylo)
sandpipers_results <- mapply(vector.diff, uncentered_major_axes$cladesandpipers, uncentered_major_axes$phylo)

## Sort the results per type
angles <- cbind(gull_results["angle", ],
                plovers_results["angle", ],
                sandpipers_results["angle", ])
projections <- cbind(gull_results["projection", ],
                     plovers_results["projection", ],
                     sandpipers_results["projection", ])
rejections <- cbind(gull_results["rejection", ],
                    plovers_results["rejection", ],
                    sandpipers_results["rejection", ])
colnames(angles) <- colnames(projections) <- colnames(rejections) <- levels(morphdat$clade)
## Plot the blob wise metrics
par(mfrow = c(3, 1))
boxplot(angles, col = colour_vector, ylab = "angle")
boxplot(projections, col = colour_vector, ylab = "projection (scaled)")
boxplot(rejections, col = colour_vector, ylab = "rejection (scaled)")

Same but with absolute values and angles modulo 90:

## Plot the blob wise metrics
par(mfrow = c(3, 1))
boxplot(angles %% 90, col = colour_vector, ylab = "angle (modulo 90)")
boxplot(abs(projections), col = colour_vector, ylab = "projection (scaled - absolute)")
boxplot(abs(rejections), col = colour_vector, ylab = "rejection (scaled - absolute)")

Phylo clade model

For this example, the MCMCglmm model has four random (phylo) effects (one for the whole phylo and one per clade) rather than just the one for the whole phylogeny. The model seems to struggle converging and there are things off with the centring (they seem to be all centred no matter what) but that last bit should not be a problem for comparing the vectors since they are translated anyways.

plot(morphdat[, c(1,2)], pch = 19, cex = 0.5,
     col = colour_vector[morphdat$clade],
     xlab = "PC1 (90.5%)",
     ylab = "PC1 (6.86%)")
plot.ellipses(model_phylo4, n = 100,
              centre = "none", add = TRUE, col = c("grey",colour_vector, "pink"))

The pink ellipse is representing the residuals.

And we can still do the whole axis stuff, same way as above:

## Get the covariance matrices
covar_matrices2 <- get.covar(model_phylo4,
                            n = 100,
                            simplify = FALSE)
## Drop the residuals
covar_matrices2 <- covar_matrices2[-5, ]

## Get all the phylo main axes
uncentered_major_axes2 <- get.axes(covar_matrices2, centre = "none")
names(uncentered_major_axes2) <- c("phylo", "cladegulls", "cladeplovers", "cladesandpipers")

Here’s what the axes look like without the ellipses and stuff:

plot.all.axes(uncentered_major_axes2,
         col = c("grey",colour_vector),
         add = FALSE, main = "Major axes (centred)")

source("../Functions/vector.diff.R")
## Get all the angles and stuff
gull_results <- mapply(vector.diff, uncentered_major_axes2$cladegulls, uncentered_major_axes2$phylo)
plovers_results <- mapply(vector.diff, uncentered_major_axes2$cladeplovers, uncentered_major_axes2$phylo)
sandpipers_results <- mapply(vector.diff, uncentered_major_axes2$cladesandpipers, uncentered_major_axes2$phylo)

## Sort the results per type
angles <- cbind(gull_results["angle", ],
                plovers_results["angle", ],
                sandpipers_results["angle", ])
projections <- cbind(gull_results["projection", ],
                     plovers_results["projection", ],
                     sandpipers_results["projection", ])
rejections <- cbind(gull_results["rejection", ],
                    plovers_results["rejection", ],
                    sandpipers_results["rejection", ])
colnames(angles) <- colnames(projections) <- colnames(rejections) <- levels(morphdat$clade)
## Plot the blob wise metrics
par(mfrow = c(3, 1))
boxplot(angles, col = colour_vector, ylab = "angle")
boxplot(projections, col = colour_vector, ylab = "projection (scaled)")
boxplot(rejections, col = colour_vector, ylab = "rejection (scaled)")

## Plot the blob wise metrics
par(mfrow = c(3, 1))
boxplot(angles %% 90, col = colour_vector, ylab = "angle (modulo 90)")
boxplot(abs(projections), col = colour_vector, ylab = "projection (scaled - absolute)")
boxplot(abs(rejections), col = colour_vector, ylab = "rejection (scaled - absolute)")

Tip-wise approach

For the tip wise approach, the method is pretty straightforward from what I’ve done previously: we just measure the projection/rejection (elaboration/exploration) on each for each axis that we’ve drawn above. We thus end up with 100 projection/rejection score for each element (tip) in the tree for:

library(dispRity)
## Wrapping function
lapply.projections <- function(one_axis, trait_space) {
    return(list(
        projection = projections(as.matrix(trait_space), point1 = one_axis[1,], point2 = one_axis[2,]),
        rejection  = projections(as.matrix(trait_space), point1 = one_axis[1,], point2 = one_axis[2,], measure = "distance")))
}
## Plot rejection/projection profiles
plot.proj.rej <- function(point_list, cols, data, main) {
    ## Colour vector
    colour_vector <- cols[morphdat$clade]
    colour_vector <- adjustcolor(colour_vector, alpha.f = 1/10)
    ## lims
    lim <- range(unlist(point_list))
    plot(NULL, xlim = lim, ylim = lim, xlab = "projection", ylab = "rejection", main = main)
    ## Add each points
    plot.points <- function(one_set, colour_vector) {
        points_args <- list()
        points_args$pch <- 19
        points_args$col <- colour_vector
        points_args$x <- one_set$projection
        points_args$y <- one_set$rejection
        do.call(points, points_args)
    }

    lapply(point_list, plot.points, colour_vector)
    return(invisible())
}

## Getting the results
phylo_results_uncentered <- lapply(uncentered_major_axes$phylogeny, lapply.projections, trait_space = morphdat[, c(1,2,3)])

## Plot results
plot.proj.rej(phylo_results_uncentered, cols = colour_vector, data = morphdat, main = "Phylo elaboration/rejection")

But maybe it’s easier to boxplot that to get the info by clade

boxplot.proj.rej <- function(point_list, data, cols) {
    ## Get the classifier
    classifier <- data$clade    
    sort.clade.rej <- function(data, classifier) {
        return(data.frame("points" = data$rejection,
                          "classifier" = classifier))
    }
    sort.clade.proj <- function(data, classifier) {
        return(data.frame("points" = data$projection,
                          "classifier" = classifier))
    }
    ## Sort the data per clade
    rejections <- do.call(rbind, lapply(point_list, sort.clade.rej, classifier))
    projections <- do.call(rbind, lapply(point_list, sort.clade.proj, classifier))
    par(mfrow = c(2,1))
    boxplot(points ~ classifier, data = projections, col = cols, main = "Phylo projections\n(projected on the whole phylo effect)")
    boxplot(points ~ classifier, data = rejections, rejections, col = cols, main = "Phylo rejections")
}

boxplot.proj.rej(phylo_results_uncentered, cols = colour_vector, data = morphdat)

And now we can see the differences for the within clades (i.e. the elaboration/exploration within each clade). The idea here is to distinguish between extra special taxa. For example, in mammals the platypus and the naked mole rat are special, however, the platypus is actually not special among platypuses whereas the naked mole rat is special among rodents!

## The within clades explo/elabo
gulls_results_uncentered <- lapply(uncentered_major_axes$cladegulls, lapply.projections, trait_space = morphdat[which(morphdat$clade == "gulls"), c(1,2,3)])
plover_results_uncentered <- lapply(uncentered_major_axes$cladeplovers, lapply.projections, trait_space = morphdat[which(morphdat$clade == "plovers"), c(1,2,3)])
sandpipers_results_uncentered <- lapply(uncentered_major_axes$cladesandpipers, lapply.projections, trait_space = morphdat[which(morphdat$clade == "sandpipers"), c(1,2,3)])



projection_list <- list("gulls"   = unlist(lapply(gulls_results_uncentered, `[[`, "projection")),
             "plovers" = unlist(lapply(plover_results_uncentered, `[[`, "projection")),
             "sandpipers" = unlist(lapply(sandpipers_results_uncentered, `[[`, "projection")))


rejection_list <- list("gulls"   = unlist(lapply(gulls_results_uncentered, `[[`, "rejection")),
             "plovers" = unlist(lapply(plover_results_uncentered, `[[`, "rejection")),
             "sandpipers" = unlist(lapply(sandpipers_results_uncentered, `[[`, "rejection")))
par(mfrow = c(2,1))
boxplot(projection_list, main = "projection within clades\n(projected on the clade specific effect)", col  = colour_vector)
boxplot(rejection_list, main = "rejection within clades", col  = colour_vector)

They seem a bit off as well. Maybe it’ll be worth centering their axis on their trait space centroid.

TODO.

Ideas for the MCMC

  1. Check on a big global model: a) how long it takes to get 1000 gen. (or so) and b) how long the burnin phase takes.
  2. Use info 1a and 1b to make design “mini chains” (i.e. chains that just go past the burnin and run for an extra ~1000 gen.); the conservative assumption is that post-burnin, the chain has reach a local optimum that exists within the distribution of the global (true) optimum.
  3. Run 3 mini chains per tree on 100 trees (resulting in 300 mini chains) to get phylogenetic uncertainty into account. For each triplet of models, discard any that’s way off (Natalie thesis style for morpho measurements). This results in N models that have reach N local optimums while taking into account phylo uncertainty.
  4. Discard any of the N models that is way off (outliers). We assume that these outliers were chains stuck on local optimums that were outside to global optimum distribution. We then end up with M (M<=N) models that are the most conservative representation of the model because they take into account phylo uncertainty AND optimum uncertainty (i.e. if running one big model, you have to make the assumption that the local optimum reached equal the global optimum, here we can skip that assumption by saying that our distribution of local optimums contains the global optimum).
  5. Combine the results of all the MCMC into one BIG posterior distribution; randomly sample n (=Mx5?) samples (or any other number) from that distribution and work from there! (if we don’t discard any models M = 300 mini chains of >=1000 samples each, thus the sample represents ~5% of all the data).